Skip to main content

DeepSeek-R1 Local Deployment


This guide describes how to locally deploy the DeepSeek-R1 LLMs on NVIDIA Jetson Orin devices using Ollama, a lightweight inference engine, to enable offline AI interaction with a simple and efficient installation process.


1. Overview

Large language models (LLMs) like DeepSeek-R1 are gradually becoming a core component of edge intelligence applications. Running them directly on Jetson Orin offers key benefits:

  • Fully offline operation
  • Low-latency response
  • Enhanced data privacy

This guide including:

  • Environment preparation
  • Installing Ollama
  • Running the DeepSeek-R1
  • (Optional) Using Open WebUI for a web-based interface

2. Environment preparation

Hardware

ComponentRequirement
DeviceJetson Orin(Nano / NX )
Memory≥ 8GB(larger models require more)
Storage≥ 10GB(varies by model size)
GPUNVIDIA GPU with CUDA support

Software

  • Ubuntu 20.04 / 22.04(JetPack 5.1.1+ recommended)
  • NVIDIA CUDA Toolkit and drivers (included with JetPack)
  • Docker (optional, for containerized deployment)

⚙️ Tip: Use jetson_clocksand check nvpmodel to enable maximum performance mode for the best inference results.


3.Ollama Installation(Inference Engine)

Option A: Native Script Installation

Open your terminal or command prompt and run the following command to install the NativeScript CLI.

curl -fsSL https://ollama.com/install.sh | sh
  • Installs the Ollama service and CLI tools.
  • Automatically handle dependencies and configure the background service.

Option B: Docker Deployment

sudo docker run --runtime=nvidia --rm --network=host \
-v ~/ollama:/ollama \
-e OLLAMA_MODELS=/ollama \
dustynv/ollama:r36.4.0

🧩 The Docker version is maintained by the NVIDIA community (dustynv) and optimized for Jetson.

Verify Ollama is Running (refer to the code below)

ss -tuln | grep 11434

Expected output:

LISTEN 0 128 127.0.0.1:11434 ...

If port 11434 is listening, the Ollama service has started successfully.


4. Running the DeepSeek-R1

Getting Start the Model

To run the 1.5B parameter version:

ollama run deepseek-r1:1.5b
  • Ollama will automatically download the model if it is not cached locally.
  • Starts an interactive conversation in the command line.

💡 Depending on your hardware capability, you can replace 1.5b with 8b14b ,etc.

Model Version Comparison

VersionMemory RequirementNotes
1.5B~6–8 GBSuitable for Orin Nano/NX
8B+≥ 16 GBRequires AGX Orin
70B🚫Not supported on Jetson

5. Web Interface (Open WebUI)

Open WebUI provides a user-friendly browser-based chat interface. open_webui

Install Open WebUI (using Docker)

sudo docker run -d --network=host \
-v ${HOME}/open-webui:/app/backend/data \
-e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main

Access the WebUI

Visit your browser with:

http://localhost:3000/
  • You can interact with the DeepSeek-R1 model graphically
  • View conversation history, and review model responses directly in the browser.

6. Performance Optimization

Optimization AreaDescription
Memory UsageUse a smaller model (e.g., 1.5B) or enable swap
Jetson PerformanceEnable MAXN and run jetson_clocks
Model CachingEnsure sufficient space in the ~/ollama directory
Runtime MonitoringUse htoportegrastats to monitor system load

📉 The initial model load may take about 30 seconds to 1 minute; subsequent runs will be faster thanks to caching.


7. Troubleshooting

IssueSolution
Port 11434 not listeningRestart Ollama or check Docker container status
Model fails to loadInsufficient memory; try using a smaller version (e.g., 1.5B)
Cannot access Web UIVerify Docker is running and connected to the host network
Ollama command not foundRe-run the installation script or add it to your $PATH

8. Appendix

Example Directory

~/ollama/                # Model cache directory  
~/open-webui/ # WebUI persistent data

References